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Abstract: Like most of the South-Asian languages Urdu and 
Sindhi are partial word order languages. Conventional 
syntax representation models like Context Free Grammars 
are not capable enough to cope with partial word order 
syntax. Linear Specification Language (LSL) is an extension 
of Context-Free Grammars (CFGs) which allows arbitrary 
partial order (free word order) on the right hand side of 
grammar rule. Partial word order in LSL is handled by 
using different types of linear precedence (LP) constraints. 
LSL by using LP constraints is capable enough to represent 
the syntax of partial word order sentence. Issues related to 
represent Urdu/Sindhi language sentences with their 
constituent parts in LSL are discussed. LSL versions for 
different types of Urdu and Sindhi sentences are presented. 
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1. INTRODUCTION 


Urdu and Sindhi language sentences have complex 
syntax structure with partial word order. A partial word 
order sentence may have many proper orders of its 
constituent parts. By imposing few restrictions on the word 
order of local languages one can represent controlled word 
order sentences in conventional syntax representation 
models like Context Free Grammars (CFGs)[1][2][3]. CFGs 
are suitable only for controlled or fixed word order 
sentences and partial word order is not efficiently modeled 
by them [4]. 

Linear Specification Language (LSL)[5] is an extension 
of CFG with some changes on the right hand side of the 
grammar rule. These changes are in the shape of Linear 
Precedence (LP) constraints which are part of the grammar 
tule. The LP constraints define some precedence rules 
which specify the constituent ordering of the sentence. 
Local language sentences can have completely or partially 
free word orders [4]. For example the Urdu sentence “ +815 
cou” can have different word orders like “8\5 ube,” 
and “1 1815 JLax”. In the same way Sindhi sentence “ cle 
$1 24” can have the forms “sg (We 4” and “ba (le 
$s”. In subsequent sections formal definition of LSL is 
discussed in detail. LSL syntax representation of Urdu and 
Sindhi sentences is also presented with examples. 

1.1. Formal Definition of LSL Grammar 
A Linear Specification Language Grammar G = (V, 
T, S, P, L) is defined by [6] as given below: 
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V: Set of variables (Non-terminals) 
T: Set of terminal symbols 
S: Start symbol (S € V) 
L: Set of lexical entries (A lexical entry is a pair Y > a, 
where Y € Vanda €T) 
P: Set of rules (productions) 
Every LSL rule consists of following two parts: 
i. A two-place relation between a variable and a set of 
variables 
ii. Some linear precedence constraints (known as LP 
constraints) between variables 
For instance 


S> VXY Z; ---------------- (i) 


Where V, X, Y and Z are variables, (i) is a two-place 
relationship between variables and (ii) shows LP 
constraints between variables. 


Three different types of LP constraints are given below: 

=" <: Weak precedence written as V < X states that 
production of variable V is completely to the left 
of the terminal production of variable X. 

= «: Immediate precedence written as X « Y states 
that the rightmost terminal derived from X stands 
immediately to the left of the leftmost terminal 
derived from Y. 

=  (): Isolation written as (V) states that terminal 
production of V is continuous. Sometimes LHS 
of the grammar rule can also be isolated which 
means that complete derivation of the rule should 
be continuous. 


A tule without LP constraint (denoted by ¢) states that 
any ordering of V, X, Y and Z can be there (free 
constituent ordering). 

Now suppose that the terminal yield of V is this is, 
that of X is a, that of Y is black and that of Z is book. 

For a sentence to be grammatical according to rule 1.1, it 
must hold that: 


1. The terminal yield of V (this is) must occur to the 
left of the terminal yield of X (a). 

2. The terminal yield of Y (black) must occur 
immediately to the left of the terminal yield of Z 
(book). 

3. Yield of V (this is) must be continuous. 
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According to above three conditions which are specified in 
LP constraints of grammar 1.1 following sentences are 
grammatically correct. 

= this is a black book 

= book this is a black 

= this is book a black 


All the above sentences are grammatically correct according 
to the LSL rule given in 1.1 (but not according to English 
grammar). 
Now consider another sentence: 

= a book this is black 
Above sentence is invalid because it violates condition (1) 
which says that V < X (this is should be before a). 


The example shows that LSL formalism has ability to 
control partially free word orders in sentences which is the 
key idea behind the use of LSL grammar for Urdu and 
Sindhi sentences with relaxed word order. 


2. LSL FOR URDU SENTENCES 


In the example discussed below the grammar of 
Ismia[7] Urdu sentence with relaxed word order is 
considered. Grammar is an extension of Urdu CFG 
presented in [2]. 


Consider an LSL grammar for Ismia Urdu sentence 

G=(V, T,S, P, L) 

Where 

Vi= {cele pk Glade «8 6 thse ¢ ail! situs ¢ aul clave Glee Alea 
crgeusl clatine «pails Jad } 

T= {4 1b (a ols (eal S Sle ol6 pase ch LS ce Spa « 

S = {rpeul Alea } 

Productions rules (P) with LP constraints and Lexical 
entries (L) are given in Figure 1. 

The LP constraints in rule | state that constituent parts 
of Ismia sentence can have any partial order provided that 
Fail-e-Naqis will always come after Masand Alya. Rules 2 
and 3 are without LP constraints which mean that their 
constituent parts can have any order. Rule 4 and 5 describe 
that Isam is isolated and continuous (which may not be the 
case for many other sentence types). 

Following sentences are derivable by using above LSL 
grammar rules and therefore are grammatically correct 
sentences. 

m8 pb Gal § 

a olf jab 3 Gal § 

Sal se 


The constituent ordering according to LP constraints is 
shown in Table 1, 2 and 3 respectively. 


Table 1 


atti Jad 
Sentence a 


LP Constraint 
Abuse all River, 


lS 
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Constituents 


ol§ jais 


als Jad ave mall sive atom! nlen |] 
ail Jed >A!) aime ¢ 

& ¢ (Optional) \xixe Gleis \sive — Ag auae .2 
é¢ (Optional) 3% Glee pA <— rie 3 
(aul) saul os 4 
(peal eel = Be 
els <— 

3S! 


aul 6 
awl .7 
aul 8 


Cums 9 


pa 

P 
aS jab 
ee 


aul 9 
| ga <— laine Glas 10 
alle — 5h Gk 1] 


ao call ded 12 


Figure 1. LSL rules for Ismia Urdu sentence 


Table 2 
LP Constraint 


Constituents Aiuus 28s eh) aul) ale 
Sentence lS yas = eels 
Table 3 


LP Constraint 


Constituents 


Sentence 


All the LP constraints of LSL grammar of Figure 1 
are satisfied in above sentences. In all above sentences the 
order of Masand is completely free while the order of 
Masand Alya and Fail-e-Nagis is partially free. Because of 
partial word order LP constraint the sentence “cS 
olf” is not grammatically correct according to above 
LSL grammar it violates the condition Geali jad > yl) siewe, 
Derivation tree for all correct sentences is given in Figure 
2. Because of partial word order LP constraints only one 
derivation tree will be formed for all possible orders. 

Now consider another set of productions for Failia[8] 
Urdu sentence. The sets V and T are same as discussed 
above. Production rules P with LP constraints and lexical 
entries given in Figure 3. 

In rule 1 of figure 3 Masand Alya is isolated and 
continuous in the definition of Jumla-e-Failia which may 
not be the case for its own definition. While Masand has 
no LP constraint which means that Masand can have free 
order of its derived (with their LP constraints) constituent 
parts in Jumla-e-Falia. 
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Figure 2. Derivation Tree for Sample Ismia (~«!) 
Sentences (2 MS a2 GallS), (Sab 22 GslLS) and 
(a2 wl § oS 2h). 


( nal) sixes ) ¢ inva all aioe riled lea |] 

(dela ) (dela Glate ) tJel dela Glais all aiee 2 

tnd J sade Glais U sade Jad Gleie sine 3 
(Jed Glete ) (Jed) JS pride Glais J gris 

(aul) feud — JS grde 4 

(aul) § awl — Jeli 5 

ye Jed Gleie 6 

cl — Jad Gleie .7 

oS ki8 

6 Se gas Glee 9 

oS} pal 10 


envery') << awl 11 


Figure 3. LSL rules for Failia Urdu sentence. 


Rule 2 specifies that Masand Alya can have isolated 
continuous constituent parts with any order. In rule 3 
Mafool has immediate precedence over Mutaliq-e-Mafool 
which means that Mafool should come before Mutaliq-e- 
Mafool and the rightmost terminal derived from Mafool 
stands immediately to the left of the leftmost terminal 
derived from Mutaliq-e-Mafool. All other parts of Masand 
are isolated and continuous with any order. Remaining rules 
include lexical entries and simple rules with isolation and 
continuation. 

Following sentences are grammatically 
according to the LSL grammar given in Figure 3. 


correct 


. lS 8 ES oS! eile Gey 
mS 8 -S 5S) Gunga yee el 
ma lS Gus le gl 8 ES oS! 
. a Sel lye curs pS US 5S) 
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1 c) a LS Gunga Lue eS ES 5S) 
why US Lye canga 8 2S OS 
7 Je AS OS) 2 UB Z! Guys Le 
. avS cl v8 S 5S! Cus pe 
. oo LS Nope Gas 9 eS ES DSI eI 
. | jue Cae gd 9 LS eS —S S81 al 


All the above word orders (and many others) are 
represented by the LSL rules of Figure 3 and are therefore 
grammatically correct. The constituent ordering of two of 
the above sentences according to LP constraints is given 
below in Table 4 and 5. 


Table 4 
LP Rivers ( nal aie ) 
Constraints (ad Gleie ) (Jed) (del ) 
A grde Glete »y J grde (dela Gleie ) 


me asl 
at 


Cc 
Constituents §. : : 


Ls 
v8 S 
SI 
a! 
lye 
Cum 


Sentence 1 


Table 5 
LP Rivers ( rall aise ) 
Constraints | (J=4 Glee) (U4) » sete (dels ) 
J gris Slain (dela Gleie ) 


PETE 


Constituents | 4 


zl 


v8 S| donde ght 
Js 


Sentence rT 


Again there will be only one derivation tree as shown 
in Figure 4 for all possible word orders given above, 
because order information is encapsulated within LP 
constraints of LSL grammar. 

Following sentence is not grammatically correct 
according to LSL rules of Figure 3 because it is stated in 
tule 1 that Masand Alya must be continuous and it is 
violated in the sentence given below by discontinuation of 
Ry ged \ya 22. 

ml yge eS 8 5S) 2 US gl Cus gs 
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Figure 4. Derivation tree for all possible word orders 
of the sentence (2 LS 6S ES SI gl Guy Ix) 


according to LSL grammar of Figure 3. 


It should be noted that writing CF grammar for all 
above word orders results in excessive number of rules that 
is for every word order there will be a different rule. 


3. LSL FOR SINDHI SENTENCES 
Now consider simple Sindhi sentence (4 .\) .le) and 
its LSL grammar given below: 
G=(V\,T, S, P, L) 
Where 
V = {lates pad Sl sa deli eld Al go J pede JS pede, glen ed 
cS! ga Jed paul} 
T = fesse ccotlns os A cau cle} 
S = {ska} 


Production rules P with LP constraints and lexical entries L 
are given in Figure 5. 

Rule | of Figure 5 specifies that constituent parts of 
sentence can have any order with independence and 
continuation of Mubtada[9][10]. In the same way LP 
constraints of rule 2, 3, 4 and 5 specify the partial or free 
word order of sentence constituents with necessary 
independence and continuation. 

Consider the sentence (# “| .e). Due to partial word 
order definition of sentence (that is continuity of Mubtada) 
only two possible proper word orders of above sentence are: 

» 3 

a: ge aes 

Derivation tree for above sentences is given in Figure 6. 
Constituent ordering of the all proper word orders is given 
below in Table 6 and 7. 
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( \xine ) ¢ jad Native <— glen -] 
Jel (Optional) 5> Sela laiee 2 
(Jel) (SI se Jeli )s 
(Optional)J s+% (Optional) 5+ Js§ 58 . 


de§ (Optional)S\ > J sade 
(Sl sa ded) (Jad) rte « SI ge Sprit 
(ull) aul ela 4 
(exsl) tga! Ul pete 5 
3 SI ed 6 
col s2.3 ed .7 
se <— aul 8 
eas <— aul 9 
gq! SI 5a ded 10 
DAK oo Al sa J eie 11 
seigis Al sa deli 12 


Figure 5. LSL rules for Sindhi Sentence. 


Table 6 


LP Constraints ed ( letie ) ( ded) (del ) 


Constituents ad Jel 
Sentence SM ele 
Table 7 


LP Constraints ose ( laine ) ( dad) (eld) 


dad 


Constituents 


Jeti 
ge 


Sentence 


The sentence “si (le (” is not grammatically 
correct according to above LSL grammar because it 
violates the LP constraint of rule 1 which says that 
Mubtada must be continuous. 


Figure 6. Derivation tree for sentences (51 I (le) 
and (gle #1 SN) 16 


Now consider another sentence with more constituent 
parts “cdl gzg AS oe aul ¢) Curgs goign” All the following 
word orders are correct according to LSL grammar of 
Figure 5. 


m cstlsis AK cso ppl g) Cus gd paige 
wm stlsis AK ceo pals @) saigie Cus 92 
stl sis AK (eo pple Cas gs saigie ¢/ 
= gl lns OAK Goo pals Cus gd paige 
: Cl ctl sis DAK om tls grigia Cau 92 
wm spighe Cus 59 @l atl as OAK om pales 
. Cu 9 sige @l dl sry AK om asls 
mu sd gaigie al gay 8K Coe pale cl 
1. Saige Cus $9 clots AS o> aale: ¢! 
m tlgay Cus 5 saligie @l aK co pals 
wm tlsas saigle Gus ga @l aK co pals 
mu gd saigie cal gig gl AK om ales 
1. ppigie Cus 9 cal gag gl aS Goo pile 


The constituent ordering of two of the above sentences 
according to LP constraints is given below in Table 8 and 9. 


Table 8 


2 (i) 
LP 4 (Jel) 
Constraints ( : si . No ) ( se deld 
Jgrde { AI ga J garde 41) 


Constituents | 4 a ag a2 ei 4 


a| 4] - 7 ‘ 
Sentence 4, j t W ‘4 4 
Table 9 
os ace) 
Crate ie 2 - pore ( ne 


ie ne, 
Constituents ag a x 


S 


Sentence 


segs 
Cus 93 
Alas 
@! 
K ge 


Word order is not completely free in above sentences. 
For example the following sentences are grammatically 
incorrect according to LSL grammar of Figure 5. 

m saigig ctl srs AS Co ails ¢) Cus 5a 
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m cetlzs OAK ce pple Cag Gl paige 
m Cau 3 gl say JAK ope pale 7) sige 


In all the sentences given above the continuation of 
Mubtada is violated as is specified by rule 1. Derivation 
tree for all constituent orders is shown in Figure 7. 

LSL rules of Figure 5 are identical to the Urdu rules 
given in Figure 3; because of the similarities of both Urdu 
and Sindhi language sentence structures. 


or 


Dl 


7 


se JS eio  Jyrds 


cea ri > od 


2 
7 


las aS Ge ale a! 


Figure 7. Derivation Tree of a SampleSentence 
(cel sas DAS coe pals g) Cus ga saigie) 


4. CONCLUSION 


Context Free Grammars are not suitable for 
computational treatment of free/partial word order syntax. 
Linear Specification Language grammar is an extension of 
CFGs that can be used to handle the partial or free word 
order of natural language text. LP constraints are used to 
define the word orders with or without restrictions in 
sentences. LSL is quite capable of defining partial word 
order of Urdu/Sindhi sentences. One LSL rule can 
describe so many proper word orders of a sentence which 
is not possible by CFG in which we need to write different 
CFG rule for every proper order of words/constituents in a 
sentence. Once LSL is defined it can be used to validate 
all different word/constituent orders of a sentence (which 
is actually checking syntax of many sentences with same 
rule). One interesting fact is that we need to construct only 
one derivation tree (also known as parse tree) for all 
proper word orders of a sentence. This is because word 
order information is embedded inside LSL rules in the 
shape of LP constraints. Another interesting fact that is 
shown by LSL examples is that Urdu and Sindhi LSL 
grammars seem to be identical which strengthens the 
possibility of grammar parallelism in these languages. 
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